AITopics | Allegheny County

Collaborating Authors

Allegheny County

Adaptive Sampling for Efficient Softmax Approximation

Neural Information Processing SystemsMar-27-2025, 09:52:45 GMT

The softmax function is ubiquitous in machine learning and optimization applications. Computing the full softmax evaluation of a matrix-vector product can be computationally expensive in high-dimensional settings. In many applications, however, it is sufficient to calculate only the top few outputs of the softmax function. In this work, we present an algorithm, dubbed AdaptiveSoftmax, that adaptively computes the top k softmax values more efficiently than the full softmax computation, with probabilistic guarantees. We demonstrate the sample efficiency improvements afforded by AdaptiveSoftmax on real and synthetic data to corroborate our theoretical results.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Neural Information Processing SystemsMar-27-2025, 09:31:43 GMT

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC [63]) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA [96]). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size. Additionally, retaining the softmax operation is particularly beneficial in "finetuning pretrained Transformers to RNNs" (T2R [41]) settings, reducing the need for extensive training from scratch. Extensive experiments confirm GSA's superior performance in scenarios requiring in-context recall and in T2R settings.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.92)
Health & Medicine > Consumer Health (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability

Neural Information Processing SystemsMar-27-2025, 05:36:13 GMT

Addressing intermittent client availability is critical for the real-world deployment of federated learning algorithms. Most prior work either overlooks the potential non-stationarity in the dynamics of client unavailability or requires substantial memory/computation overhead. We study federated learning in the presence of heterogeneous and non-stationary client availability, which may occur when the deployment environments are uncertain, or the clients are mobile. The impacts of heterogeneity and non-stationarity on client unavailability can be significant, as we illustrate using FedAvg, the most widely adopted federated learning algorithm. We propose FedAWE, which includes novel algorithmic structures that (i) compensate for missed computations due to unavailability with only O(1) additional memory and computation with respect to standard FedAvg, and (ii) evenly diffuse local updates within the federated learning system through implicit gossiping, despite being agnostic to non-stationary dynamics. We show that FedAWE converges to a stationary point of even non-convex objectives while achieving the desired linear speedup property. We corroborate our analysis with numerical experiments over diversified client unavailability dynamics on real-world data sets.

artificial intelligence, fedavg, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges

Shin, Ukcheol, Park, Jinsun

arXiv.org Artificial IntelligenceMar-27-2025

--Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera ( i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. AUTONOMOUS driving aims to develop intelligent vehicles capable of perceiving their surrounding environments, understanding current contextual information, and making decisions to drive safely without human intervention. Recent advancements in autonomous vehicles, such as Tesla and Waymo, have been driven by deep neural networks and large-scale vehicular datasets, such as KITTI [1], DDAD [2], and nuScenes [3]. Manuscript received March XX, 2025; revised April XX, 2025. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(RS-2024-00358935). Ukcheol Shin is with the Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America (e-mail: ushin@andrew.cmu.edu). Jinsun Park is with the School of Computer Science and Engineering, Pusan National University, Busan, Republic of Korea (e-mail: jspark@pusan.ac.kr). Color versions of one or more figures in this article are available at https://doi.org/xx.xxxx/TIV However, a major drawback of existing vehicular datasets is their reliance on visible-spectrum images, which are easily affected by weather and lighting conditions such as rain, fog, dust, haze, and low light. Therefore, recent research has actively explored alternative sensors such as Near-Infrared (NIR) cameras [8], Li-DARs [9], [10], radars [11], [12], and long-wave infrared (LWIR) cameras [13], [14] to achieve reliable and robust visual perception in adverse weather and lighting conditions. Among these sensors, LWIR camera ( i.e., thermal camera) has gained popularity because of its competitive price, adverse weather robustness, and unique modality information ( i.e., temperature).

artificial intelligence, machine learning, thermal image, (19 more...)

arXiv.org Artificial Intelligence

2503.2206

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.54)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy (0.67)
Information Technology > Robotics & Automation (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning

Neural Information Processing SystemsMar-26-2025, 15:16:46 GMT

Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on the exact wording of the question (i.e., the specific prompt). At the same time, users often have a limit on monetary budget and latency to answer all their questions, and they do not know which LLMs to choose for each question to meet their accuracy and long term budget requirements.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Analytically deriving Partial Information Decomposition for affine systems of stable and convolution-closed distributions

Neural Information Processing SystemsMar-26-2025, 06:19:08 GMT

Bivariate partial information decomposition (PID) has emerged as a promising tool for analyzing interactions in complex systems, particularly in neuroscience. PID achieves this by decomposing the information that two sources (e.g., different brain regions) have about a target (e.g., a stimulus) into unique, redundant, and synergistic terms. However, the computation of PID remains a challenging problem, often involving optimization over distributions. While several works have been proposed to compute PID terms numerically, there is a surprising dearth of work on computing PID terms analytically. The only known analytical PID result is for jointly Gaussian distributions. In this work, we present two theoretical advances that enable analytical calculation of the PID terms for numerous wellknown distributions, including distributions relevant to neuroscience, such as Poisson, Cauchy, and binomial.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.88)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accenture-NVS1: A Novel View Synthesis Dataset

Sugg, Thomas, O'Brien, Kyle, Poudel, Lekh, Dumouchelle, Alex, Jou, Michelle, Bosch, Marc, Ramanan, Deva, Narasimhan, Srinivasa, Tulsiani, Shubham

arXiv.org Artificial IntelligenceMar-24-2025

This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.

acc-nvs1, artificial intelligence, dataset, (11 more...)

arXiv.org Artificial Intelligence

2503.18711

Country:

North America > United States > Texas > Travis County > Austin (0.35)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.35)

Genre: Research Report (0.50)

Industry:

Government (0.48)
Professional Services (0.41)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift

Neural Information Processing SystemsMar-23-2025, 08:30:51 GMT

We establish a new model-agnostic optimization framework for out-of-distribution generalization via multicalibration, a criterion that ensures a predictor is calibrated across a family of overlapping groups. Multicalibration is shown to be associated with robustness of statistical inference under covariate shift. We further establish a link between multicalibration and robustness for prediction tasks both under and beyond covariate shift. We accomplish this by extending multicalibration to incorporate grouping functions that consider covariates and labels jointly. This leads to an equivalence of the extended multicalibration and invariance, an objective for robust learning in existence of concept shift. We show a linear structure of the grouping function class spanned by density ratios, resulting in a unifying framework for robust learning by designing specific grouping functions.

artificial intelligence, machine learning, predictor, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization

Agrawal, Akash, McComb, Christopher

arXiv.org Artificial IntelligenceMar-23-2025

Multi-fidelity Reinforcement Learning (RL) frameworks efficiently utilize computational resources by integrating analysis models of varying accuracy and costs. The prevailing methodologies, characterized by transfer learning, human-inspired strategies, control variate techniques, and adaptive sampling, predominantly depend on a structured hierarchy of models. However, this reliance on a model hierarchy can exacerbate variance in policy learning when the underlying models exhibit heterogeneous error distributions across the design space. To address this challenge, this work proposes a novel adaptive multi-fidelity RL framework, in which multiple heterogeneous, non-hierarchical low-fidelity models are dynamically leveraged alongside a high-fidelity model to efficiently learn a high-fidelity policy. Specifically, low-fidelity policies and their experience data are adaptively used for efficient targeted learning, guided by their alignment with the high-fidelity policy. The effectiveness of the approach is demonstrated in an octocopter design optimization problem, utilizing two low-fidelity models alongside a high-fidelity simulator. The results demonstrate that the proposed approach substantially reduces variance in policy learning, leading to improved convergence and consistent high-quality solutions relative to traditional hierarchical multi-fidelity RL methods. Moreover, the framework eliminates the need for manually tuning model usage schedules, which can otherwise introduce significant computational overhead. This positions the framework as an effective variance-reduction strategy for multi-fidelity RL, while also mitigating the computational and operational burden of manual fidelity scheduling.

machine learning, reinforcement learning, variance, (16 more...)

arXiv.org Artificial Intelligence

2503.18229

Country:

Europe (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

Neural Information Processing SystemsMar-21-2025, 20:44:59 GMT

This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each new task is carefully chosen to interoperate with modern convolutional neural network (CNN) search methods while being far-afield from their original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how our benchmark and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360.ml.cmu.edu/.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: